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Abstract: Feynman loop integrals are a key ingredient for the calculation of higher order radiation effects, and 
are responsible for reliable and accurate theoretical prediction. We improve the efficiency of numerical integration in 
sector decomposition by implementing a quasi-Monte Carlo method associated with the CUDA/GPU technique. For 
demonstration we present the results of several Feynman integrals up to two loops in both Euclidean and physical 
kinematic regions in comparison with those obtained from FIESTA3. It is shown that both planar and non-planar 
two-loop master integrals in the physical kinematic region can be evaluated in less than half a minute with C>(1CE 3 ) 
accuracy, which makes the direct numerical approach viable for precise investigation of higher order effects in multi¬ 
loop processes, e.g. the next-to-leading order QCD effect in Higgs pair production via gluon fusion with a finite top 
quark mass. 
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1 Introduction 

The scalar particle predicted by the Standard Model, 
the Higgs boson, has been finally discovered at the CERN 
Large Hadron Collider (LHC)[l, .2}. This milestone has 
inspired various exciting investigations on the further 
details of the Higgs boson and related research. Re¬ 
cently the launch of LHC Runll at 13 TeV collision 
energy brings physics exploration to a new era. With 
the highest ever center-of-mass energy and luminosity, 
many scattering processes that potentially answer some 
fundamental physics questions will be able to reach an 
accuracy in the percent range or better, while the appro¬ 
priate analysis on high precision data demands that the 
uncertainties of theoretical prediction reach the same ac¬ 
curacy. In order to achieve this, higher order QCD effects 
must be included in theoretical predictions. For instance, 
state-of-the-art investigations on Higgs inclusive produc¬ 
tion have explored next-to-next-to-next-to-leading order 


(NNNLO) effects for the quark pair annihilation initial 
state 0 and for the gluon fusion initial state [I * 1 ] • Mean¬ 
while, next-to-next-to-leading order (NNLO) theoretical 
predictions have been provided for Higgs pair production 
[5] and the associated production of Higgs with jel |7i -s' 
or vector boson [Mt As the accumulated luminosity 
at the LHC increases, the investigation of higher order 
QCD effects will be wanted for more processes, e.g. top 
quark production and jet cross sections. In the higher 
order effects, one of the most important ingredients is 
virtual correction, which always relies on evaluation of 
Feynman loop integrals. 

After decades of effort, various algorithms have been 
proposed for evaluating Feynman loop integrals, includ¬ 
ing both analytical and numerical approaches. The ana¬ 
lytical approaches can provide explicit expressions for 
Feynman integrals, and can further reveal significant 
physical characteristics. However, when complicated 
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processes are encountered, it becomes difficult to obtain 
analytical solutions, while the numerical approaches can 
solve more challenging problems in spite of a heavy bur¬ 
den of evaluation time. Sector decomposition, one of the 
numerical algorithms, was introduced as a systematic ap¬ 
proach by Binoth and Heinrich [Til [lif . Following a cer¬ 
tain choice of decomposition strategies, this algorithm 
divides the domain of loop integration into sectors. In 
each individual sector, proper transformation of integra¬ 
tion variables is performed to explicitly reveal the ultra¬ 
violet (UV) and infrared (IR) singularities. Ultimately 
the coefficients of a Laurent series in e of the Feynman 
integral can be evaluated numerically. Initially, sector 
decomposition was implemented for the Feynman inte¬ 
gral in the Euclidean kinematic region 13H15|, where the 


Cauchy singular integral can be avoided. Later, inspired 
by Nagy and Soper ld[17j, integration contour deforma¬ 
tion was proposed [18( as a systematic scheme to extend 
sector decomposition to the physical kinematic region. 

Several programs fl9lT2ll | have implemented the sec¬ 
tor decomposition algorithm for the numerical evalu¬ 
ation of Feynman loop integrals. Normally they use 
Monte Carlo (MC) integration methods, which have been 
widely used in high energy physics research. For in¬ 
stance, Vegas [22l|. an adapative Monte Carlo method, 
can achieve an integration convergence rate of 0(1/ y/n). 
In this paper, we implement the quasi-Monte Carlo 
(QMC) method for the numerical evaluation of the inte¬ 
grals in sector decomposition. As a better choice, QMC 
can have a convergence rate close to d(n _1 ) for differen¬ 
tiable integrands. Furthermore, we adopt the technique 
of CUDA/GPU to improve the performance of numerical 
evaluation. 

This paper is organised as follows. In Section II we 
review the sector decomposition algorithm, then Sec¬ 
tion III gives a brief description of the QMC integration 
method. In Section IV we compare the performance of 
our program with FIESTA (20I] . Our conclusions are then 
presented in the final section. 


After Feynman parameterisation and integration over 
the loop momenta, one can obtain 




T(N v -LD/2) 

nLrfo) Jo 


poo poo 


^[dx, x? 


\ U N "- (L+1)D/2 (^-,x n ) 


where N„ = and U and F are polynomials of 

{xi} and can be straightforwardly derived from the mo¬ 
mentum representation, or constructed from the topol¬ 
ogy of the corresponding Feynman graph [23j. 

Further treatment of the Feynman integral requires 
careful consideration since U and F can vanish when 
some of {xi} approach zero, which may be related to 
UV or IR divergence. Direct numerical integration is 
impossible for divergent integrals. A sector decomposi¬ 
tion algorithm is designed to systematically extract the 
divergence, and is briefly described as follows[241. 

Firstly, the integration domain is equally split into N 
sub-domains, which are called primary sectors: 


) N „ 0 o N 

d Jv x = ^ / d^x — x :j ). (3) 


3=1 

37 s * 


Then in the i-th sector we implement the variable trans¬ 
formation, 


Xitj 

j<i, 


Xi 

j — h 

(4) 

X^tj — 1 

j>i. 



Thereafter X; integration is performed associated with 
the step function, and now the Feynman integral can be 
expressed as 


2 sector decomposition 

Generically an L-loop Feynman integral has the fol¬ 
lowing representation: 


where 


/ = ( _ ir .£^ziA 2 E/ „ (5) 

rL=ii>, ) v 
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L r\ D k 1 N 
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i= l R J “=1 (<?3 2 - m 3 2 + i£ ) Vi ’ 
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where D = 4— 2e; qj is the momentum of relevant inter¬ 
nal propagator, and is a linear combination of the loop 
momenta {fcj} and external momenta; rrij is the mass of 
the relevant internal propagator; and u :l is the power of 
the corresponding propagator. 


I,= 



n dt i % 


T tN„-(L+1)D/2,. , \ 

V 4-1 \ u i Rii''' >Uv- 1 ) 


F, 


N V -LD /2 


(t 1 ,’ ’ ’ ,tjv- 1 ) 

_ ( 6 ) 

Obviously for any given primary sector /;, the domain of 
integration is an (N — l)-dimensional unit cube. 

Next;_ following an iterative decomposition 


strategy fljj 


or geometric strategy 


25 


each pri¬ 


mary sector is finally divided into some subsectors {h a } 
so that in any subsector polynomials Ui and F/ can be 
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factorised into the form 

N-l 

Ul =Cla(l + Hl a (tl,--- ,fjV-l)) tj 

3=0 

N-l 

3=0 


(7) 

( 8 ) 


after proper variable transformation. In each subsector, 
the new variables and Jacobian generated by the trans¬ 
formation are required to be monomials of original vari¬ 
ables, and meanwhile the transformation projects the do¬ 
main of integration to the (N— l)-dimensional unit cube. 
In above expressions, b la j and b' la:j are non-negative in¬ 
tegers. H, n and H!„ are polynomials of |T) such that 
Hia(0, ■ ■■ ,0) = 0 and U/ a (0, ■ ■ • ,0) = 0. 

Now the primary sector becomes the combination of 
subsectors, 


h=J2 D ‘« 


1 d tj t 


01aj e 


<3 = 1 


(1 ’ 


(9) 


where the powers of tj are collected into ai a j+/3i a je, and 
D la contains the coefficients from the Jacobian and Ci a 
and C[ a . 

In the practical evaluation of Feynman integrals, we 
will adopt the geometric strategy since it can be guar¬ 
anteed to succeed and results in the smallest number of 
subsectors. 

After sector decomposition, the singularities in the 
Feynman integral have been collected into the regulators 
in the form of t a+ ^ e , which can explicitly present the pole 
of the integral by using a Laurent series or integration by 
parts (IBP). Without loss of generality, we rewrite the 
integral with a certain regulator as 

1= [ df t a+Pc f(t,e), (10) 

Jo 


where /(0,e) is non-zero and finite. Then if a< —1, the 
above integral contains a singularity on the lower bound. 
By expanding f(t,e) into a Laurent series around t = 0, 
the singularity can be explicitly extracted as 


1 = 


M—i 

\ ' 


/ (p) ( 0,e) 


„ a+p + l + /3e 

p =0 


P- 


h f dtf“ +/3 V(f,e), (11) 
Jo 


where 

r(t,e) = f(t,e)~Y / fM( 0,e)~ (12) 

Thereafter, the integrands can be expanded by small e, 
and the coefficients of the Laurent series in e can be eval¬ 
uated numerically order by order. However, numerical 


evaluation of r(t) may suffer numerical instability from 
large number cancellation. An alternative approach to 
the pole extraction that can avoid this problem is IBP, 


[ d t t a+ ^f(t,e)= l f( l,e)- 
J o a + pe + l 


a + fie + V 
/- 1 0/(t,e) t°‘ +f,e+1 

q Ot q. + f3a +1 


(13) 


It can be seen that the power of t increases by one. 
Therefore, by repeating the above IBP formula enough 
times, the power of t will not generate singularity, and 
then the numerical approach can be implemented to eval¬ 
uate the coefficients of the Laurent series in e. 

However, occasionally the integral contains Cauchy 
singularities in the physical kinematic region. In this 
case, the sign of F cannot be guaranteed definite, so the 
Cauchy singular Feynman integral is only valid under 
a proper contour according to the conventional is pre¬ 
scription of the Feynman propagators. Practically such 
an infinitesimal shifted contour will sabotage the stabil¬ 
ity of numerical integration. Fortunately an interesting 
prescription of contour deformation has been proposed 

z i (t)=ti-iX i t i (l-t i ) d ^ t , (14) 

where an appropriate choice of can guarantee the sign 
of Im(.F(,?)) is always negative as required convention¬ 
ally. 


3 Quasi-Monte Carlo 


After the implementation of the sector decomposition 
algorithm reviewed in Section II, the Feynman integral 
is expressed as a Laurent series in e. The coefficients of 
the series are composed of convergent integrals, which 
can be numerically evaluated. A one-dimensional inte¬ 
gral can be easily evaluated by a numerical approach 
such as the trapezoidal rule, while numerical evaluation 
of multi-dimensionals integral is usually much more dif¬ 
ficult. 

For instance, an s-dimensional integral can be writ¬ 
ten as 


W)= f d °xf(x). (15) 

J[ o,i] s 

Given a predefined set of n points {£,1 ay e [0,l] s ; i = 
0,...,n— 1}, the above integral can be estimated by 

n-l 

Q«,a(/) = -E/(^W ( 16 ) 

n z ' 

i=0 


This method is called the quasi-Monte Carlo method, 
and the point set is called the quasi-Monte Carlo rule [281. 
Conventionally two families of QMC rules attract most 
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interest. One consists of digital nets and digital se¬ 
quences, while the other is the lattice rule. In this paper 
we adopt the rank-1 lattice rule (R1LR) defined by[28[ 


However, in practice even when the integrand / is a non¬ 
periodic function, one can implement the transformation 
Xi = 3 y\ — 2 y\ to obtain a periodic integrand as below: 


^i= * = 0,...,n-l, (17) 

where z, known as the generating vector, is an s- 
dimensional integer vector. The integer components of 
z are all relatively prime to n. The braces around the 
vector in Eq. ED means only the fractional part of each 
component in the vector is taken. 

The previous R1LR will result in a biased estimation, 
since it is fully deterministic. To achieve an unbiased re¬ 
sult, we need to introduce appropriate randomisation. 
For R1LR, we can use the simplest form of randomisa¬ 
tion, called shifting, which will yield so-called shifted lat¬ 
tice rule. The QMC algorithm utilising random shifted 
R1LR is explained as below [28j. 

1. A set of m independent random vectors called 
shifts, {A 1; • • • , A m }, is generated with uniform dis¬ 
tribution in [0,l] s . 

2. For each shift, the integral estimation in Eq. ED is 

repeated to obtain a set of m integral estimations, 
{Qn,s,i(/) 5 ”' where 

Qn, s ,fc(/) = ^^/Q^ + A fc j^, k = l,...,m. 

(18) 


L(f) = / d s xf(x) 

4[0,lp 

= [ d s yf{x{y))f[{6yi(l-yi)) (21) 

UMP i=1 

= / d a yg(y): 

7[o,i] s 

where it can be seen that g(y) = 0 once y reaches bound¬ 
ary of [0,l] s , as long as / is bounded at the edge. 

Beside the 0(n _1 ) convergence rate, the shifted 
R1LR QMC method has some intrinsic advantages. In 
the shifted R1LR QMC method the lattice rule is deter¬ 
ministic and therefore the complexity of random number 
generation only depends on the number of shifts. By con¬ 
trast, in the MC method, since the evaluation points are 
independent random vectors, a large amount of pseudo 
random number generation consumes much more of the 
GPU resources. Besides, since the QMC method is a 
non-adaptive method, it can easily deal with integrals in 
complex space, which is inevitable for Feynman integrals 
in the physical kinematic region. However, for adaptive 
algorithms, e.g. Vegas, it is difficult to define an appro¬ 
priate rule to handle such integrals. 

4 Numerical Results 


3. Then the average of these m integral estimations 

^ m 

Qn,s,m(f) = -Y / Qn,sAf), ( 19 ) 

m ti 

is finally taken as an unbiased approximation of the 
integral /„(/). 

4. Furthermore, an unbiased estimation for the mean- 
square error of Qn, s ,m(f) can be obtained by 


1 

m(m— 1) 


^(Qn, S , fe (/)-Q W (/)) 2 - 

k= 1 


( 20 ) 


In this section, we present some numerical results for 
several Feynman integrals up to two loops for certain 
choices of kinematic parameter as a demonstration. In 
the Euclidean kinematic region we show the evaluation of 
massless scalar double box diagrams, and in the physical 
kinematic region we take several master integrals from 
the investigation of Higgs pair production via gluon fu¬ 
sion. The Higgs and top quark masses are chosen as 
Mb = 125 GeV and M t = 172 GeV jUj]. By compar¬ 
ing with FIESTA3[2p| using the Vegas algorithm 22, 321 
as a benchmark of CPU efficiency, we illustrate the im¬ 
provement in the efficiency of numerical evaluation of 
Feynman integrals. The sample codes for the Feynman 
integrals in this section were generated by MIRACLE, 
which is a general purpose package in preparation. 


The above algorithm improves the practicability of 
the R1LR QMC method. Moreover, in the case that 
the integrand / is a 1-periodic function and all par¬ 
tial derivatives of / exist, the convergence rate can be 
improved to 0(n~ 1 ) [29|, ;3Q] if the generating vector is 
obtained by the component-by-component construction. 


4.1 One-loop Feynman integral in physical kine¬ 
matic region 

The leading order (LO) contribution to Higgs pair 
production via gluon fusion contains the one-loop box 
Feynman diagram as shown in Fig. [l] For the evaluation 


* Our program was deployed with NVIDIA Tesla K20 GPU, while FIESTA3 used four cores of Intel Core i7 3770 CPU (3.4GHz). 
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of this diagram, the most complicated master integral is 

_ r _d°k _1_ 

A J ( 27 t) d (— k 2 + M 2 + *e)[— (k+Pi) 2 + M 2 + ie\ 

1 

[- (k-p 2 ) 2 + M 2 + ie][-{k+p 1 -p 4 ) 2 + M 2 + ie]' 

( 22 ) 



duction via gluon fusion, where the initial state 
momenta are incoming and the final state mo¬ 
menta are outgoing. 


The initial states are on-shell gluons p\ = p 2 = 0, and 
the final states are on-shell Higgs bosons p 2 =p\ = M\. 
The Mandelstam variables are defined as (pi+p 2 ) 2 = s 
and (p 2 -p 3 ) 2 = t. 0 

As shown in Fig. [2j we evaluate 1000 points between 
s = 70000 and s = 500000, while t = —6000 is fixed. 
The average time to evaluate one point is 83 ms, which 
is acceptable for practical calculation of one-loop Feyn¬ 
man integrals. We can find that the threshold effect is 
explicitly shown at s = 4M 2 = 118336. Below the thresh¬ 
old Re {Ia) vanishes since at the moment the sign of the 
F-term in each decomposed subsector is definite. Mean¬ 
while Im(/ A ) peaks on the threshold, where we can see 
an obvious large relative error due to slow convergence 
of some integrals. Fig. [2] also presents the comparison 
with results from LoopToolsj3^. It can be seen that the 
relative errors of our results are within 10~ 3 , and the 
relative errors can be smaller than 10 -4 for most of the 
points. 


xl(T 12 xl0“ 12 




xl(T 



12 3 4 

(c) Relative error of Hjs(Ia) 


s/10 



2 3 4 5 

(d) Relative error of Imf/a) 


s/10 


Fig. 2. Comparison of single box master integral with LoopTools. 


t For simplicity, in the following the dimension of scale is set as GeV by default. 
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4.2 Two-loop Feynman integral in Euclidean 
kinematic region 



written as 


Ib — 


' d D fci d °k 2 


1 


D_ D_ (— k\ +ie)[-(k 1 —p^+ie] 
in 2 * 7 t 2 

1 

[-(fci +p 2 ) 2 + ie][-(k 1 - k 2 -pi_) 2 +ie\ 

1 

(-ki + ie)[-(k 1 +p 2 - k 2 +p 3 ) 2 +*e] 

1 


[—(fci — k 2 +p 2 ) 2 + ie] 


(23) 


incoming. 


Because this Feynman integral contains divergences, the 
results are expressed in form of a Laurent series in e, 


For the demonstration of a two-loop Feynman inte¬ 
gral in the Euclidean kinematic region, we choose the 
massless double box diagram with two legs off-shell 14[, 
as shown in Fig. [3] Explicitly this Feynman integral is 


i=4 

I B =e~ 2 ^ E s~ 3 ~ 2e ^2 

i =0 


Ei 

e* 


(24) 


During the numerical evaluation, the Mandelstam vari¬ 
ables are set as s = (pi+p 2 ) 2 = —2/3, t = (p 2 +p 3 ) 2 = —2/3, 
and u = (pi +p 3 ) 2 = -2/3. 


Table 1. Comparison of double box Feynman integral in Euclidean kinematic region. 


(pi^p^pbpi) 

(-1,0,0,-!) 

(o,-: 

1,0,-1) 

(0,0,-1,-1) 


Vegas/CPU 

QMC/GPU 

Vegas/CPU 

QMC/GPU 

Vegas/CPU 

QMC/GPU 

Pa 

0.25±3 x 10 -6 

0.25i 1 x 10 -7 

0 

0 

0.2501 i 0.0001 

0.25i2 X 10 -7 

P 3 

0.40547± 0.00004 

0.40546 i 0.00006 

0 

0 

0.4057 i 0.005 

0.40544 i 0.00003 

P2 

0.6500T0.0003 

0.6489 i 0.0005 

1.0582i0.0001 

1.05799i0.00003 

3.118i0.002 

3.118i0.001 

Pi 

-1.183F0.001 

— 1.1823i0.0006 

1.0938 i 0.0005 

1.0947 i 0.0009 

12.522 i 0.007 

12.533i0.007 

Po 

—8.798i0.004 

-8.801 i 0.005 

-3.000i0.001 

-3.003i0.002 

35.60i0.03 

35.60i0.03 

Integration Time 

500s 

2.2s 

45s 

0.84s 

117s 

2.2s 


In TableHJ we compare our results (marked as QMC) 
with those from FIESTA3 [2 q|. It can be seen that all 
the results are consistent to 0(1O -3 ), while our program 
is much (about 50-200 times) faster than FIESTA3. This 
implies that our program can provide the results of Feyn¬ 
man integrals with much higher accuracy in the Eu¬ 
clidean kinematic region. One of the reasons for the 
improvement is the QMC algorithm, which has been 
proved to have better conver genc e rate compared to con¬ 
ventional MC algorithm [29j, |3(| ■ Another reason is the 
implementation of the GPU-accelerated computing. A 
GPU has a specified parallel architecture consisting of 
thousands of smaller, more efficient cores designed for 
handling multiple tasks simultaneously. In contrast, a 
CPU consists of a few cores optimized for sequential se¬ 
rial processing. The combination of the QMC algorithm 
and GPU-accelerated computing provide an efficient way 
to evaluate the Feynman integrals. 


4.3 Two-loop Feynman integral in physical kine¬ 
matic region 



Fig. 4. Two-loop double box Feynman diagram for 
Higgs pair production via gluon fusion, where the 
initial state momenta are incoming and the final 
state momenta are outgoing. 


The next-to-leading order (NLO) contribution to 
Higgs pair production via gluon fusion contains the two- 
loop double box Feynman diagram as shown in Fig. UJ 
which is one of the challenges for an analytical approach 
with a finite top quark mass. To evaluate this diagram, 
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we will confront the complicated master integral 
1 (p j tl c p i~ i 


Ic = 


£_ D. {—kl+ie)[—{k 1 —p 1 ) 2 +ie\ 
in 2 in 2 

1 

[— (fei +P 2) 2 -Me] [—(Ah - k 2 -pi) 2 +M 2 +ie] 

1 

(— kl + M 2 + ie) [-(fei +p 2 - k 2 -p 3 ) 2 + M 2 + ie\ 
[- (fci - k 2 +p 2 ) 2 + M 2 + ie]' 


This Feynman integral contains IR divergences, so we 
express the results in form of a Laurent series in e, 

1=2 p 

I c = e~ 2tlE s -3-2e ^. (26) 

i=0 £ 

Here the initial states are on-shell gluons p 2 and 

the final states are on-shell Higgs bosons p 2 =p\ = M^. 
The Mandelstam variables are set as s = (pi +p 2 ) 2 = 
160000 and t = (p 2 -p 3 ) 2 = -6000. 


Table 2. Comparison of two-loop double box Feynman integral in physical kinematic region. 



Vegas/CPU 

QMC/GPU 

P 2 

-7.959 ± 0.009 - 10.586* ± 0.009* 

—7.949±0.003— 10.585*±0.005* 

Pi 

3.9T0.1 —28.1*±0.1* 

3.831 ± 0.005 - 28.022* ±0.005* 

Po 

-3.9 ±0.8+ 92.3* ±0.8* 

—4.63 ±0.07 ±92.13* ±0.07* 

Integration Time 

45540s 

19s 


As shown in Table [2j after more than 12 hours, the 
relative error of the finite term from FIESTA3 can only 
reach 10 -2 . By comparison, in 19 seconds our program 
can obtain an accuracy of about 10^ 3 , which may cost 
FIESTA3 more than a thousand hours. Moreover, com¬ 
pared with the efficiency improvement obtained in the 
Euclidean kinematic region in Section [4T2j we can find a 
much better improvement in the physical kinematic re¬ 
gion due to the advantages of the QMC algorithm for 
complex integrals as explained in Section [3j Besides, it 
can be seen that the efficiency of our program for two- 
loop Feynman integrals makes the numerical approach 
viable for the evaluation of the NLO virtual contribu¬ 
tion to Higgs pair production via gluon fusion with a 
finite top quark mass. Therefore an investigation of the 
finite top quark mass effect in Higgs pair production can 
be accomplished within months. 

4.4 Non-planar Two-loop Feynman integral in 
physical kinematic region 



Fig. 5. Non-planar two-loop double box Feynman 
diagram for the Higgs pair production via gluon 


fusion, where the initial state momenta are incom¬ 
ing and the final state momenta are outgoing. 


The non-planar two-loop diagram shown in Fig. [5] 
also contributes to Higgs pair production via gluon fu¬ 
sion at NLO. During the evaluation of this diagram, the 
most complicated master integral is 


In — 


' d^/cj d °k 2 


1 


£_ (-fcf + *e)[-(fci+pi) 2 -l-i£] 
in 2 in 2 


1 


[-(k 1 -p 2 ) 2 + ie\[-(k 1 +p 1 -k 2 ) 2 +M 2 + ie\ 

1 

{—kl + M 2 +ie)[-{k 2 -p 4 ) 2 + M 2 + ie] 

1 


[-{k 1 +p 1 -k 2 -p 3 ) 2 + M 2 + ie\ ’ 


(27) 


which contains IR divergences, and can be expressed in 
form of a Laurent series in e, 


i= 2 

/ D = e - 2 ^ Es -3-2 'Y*' 


(28) 


The same as the configuration in Section 14.31 the initial 
states are on-shell gluons p\=pl = 0, and the final states 
are on-shell Higgs bosons p\=p\ = M^-. The Mandel¬ 
stam variables are set as s = {p 3 +p 2 ) 2 = 160000 and 
t = {p 2 —p 3 ) 2 = —6000. 


Table 3. Comparison of non-planar two-loop double box Feynman integral in physical kinematic region. 
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Vegas/CPU 

QMC/GPU 

P2 

—3.848 ±0.004 +0.0005* ±0.003* 

-3.8482 ± 0.0007 + 0.0004* ± 0.0003* 

Pi 

3.81 ±0.03-6.41* ±0.03* 

3.83 + 0.02-6.40*±0.02* 

Po 

77.2 + 0.2 + 20.1*±0.2* 

77.2 ±0.1+ 19.9* ±0.1* 

Integration Time 

54290s 

20s 


In Table [3j we can see that our program can obtain 
a result with 0(1O -3 ) accuracy in 20 seconds, while the 
relative error of the FIESTA3 result takes over 15 hours 
to reach near that order of accuracy. By comparing with 
the results in previous section, it is obvious that the 
non-planar double-box master integral has a slower con¬ 
vergence rate. This is consistent with the conventional 
conclusion that the evaluation of non-planar diagrams is 
more difficult than planar ones. Nonetheless, it can be 
seen that efficiency of our program for the evaluation of 
non-planar master integrals is acceptable for the practi¬ 
cal numerical approach, which can provide NLO numer¬ 
ical results for Higgs pair production within months. 

5 Conclusion 

We have implemented the shifted R1LR QMC 
method associated with CUDA/GPU to numerically 


evaluate Feynman loop integrals by using a sector de¬ 
composition algorithm. Some examples are presented 
to show the promising efficiency of our program on the 
numerical evaluation of Feynman loop integrals. For a 
one-loop box Feynman integral, we could obtain an accu¬ 
racy of about 10 -4 in tens of milliseconds. For two-loop 
double box Feynman integrals, the accuracy can reach 
about 10 -3 in several seconds in the Euclidean kinematic 
region, while in the physical kinematic region less than 
half a minute is needed. The efficiency of our program 
can make the direct numerical approach viable for the 
precise investigation on some important processes, e.g. 
Higgs pair production via gluon fusion at NLO with the 
finite top quark mass effect. 
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